AITopics | recognition accuracy

Collaborating Authors

recognition accuracy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

b98249b38337c5088bbc660d8f872d6a-Paper.pdf

Neural Information Processing SystemsFeb-19-2026, 06:23:13 GMT

dataset, domain generalization, generalization, (14 more...)

Neural Information Processing Systems

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > China (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

A Hyperspectral Imaging Guided Robotic Grasping System

Sun, Zheng, Dong, Zhipeng, Wang, Shixiong, Chu, Zhongyi, Chen, Fei

arXiv.org Artificial IntelligenceDec-8-2025

Hyperspectral imaging is an advanced technique for precisely identifying and analyzing materials or objects. However, its integration with robotic grasping systems has so far been explored due to the deployment complexities and prohibitive costs. Within this paper, we introduce a novel hyperspectral imaging-guided robotic grasping system. The system consists of PRISM (Polyhedral Reflective Imaging Scanning Mechanism) and the SpectralGrasp framework. PRISM is designed to enable high-precision, distortion-free hyperspectral imaging while simplifying system integration and costs. SpectralGrasp generates robotic grasping strategies by effectively leveraging both the spatial and spectral information from hyperspectral images. The proposed system demonstrates substantial improvements in both textile recognition compared to human performance and sorting success rate compared to RGB-based methods. Additionally, a series of comparative experiments further validates the effectiveness of our system. The study highlights the potential benefits of integrating hyperspectral imaging with robotic grasping systems, showcasing enhanced recognition and grasping capabilities in complex and dynamic environments. The project is available at: https://zainzh.github.io/PRISM.

artificial intelligence, information, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/LRA.2025.3575654

2512.05578

Country: Asia > China (0.28)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Heterogeneous Stroke: Using Unique Vibration Cues to Improve the Wrist-Worn Spatiotemporal Tactile Display

Kim, Taejun, Shim, Youngbo Aram, Lee, Geehyuk

arXiv.org Artificial IntelligenceNov-27-2025

Beyond a simple notification of incoming calls or messages, more complex information such as alphabets and digits can be delivered through spatiotemporal tactile patterns (STPs) on a wrist-worn tactile display (WTD) with multiple tactors. However, owing to the limited skin area and spatial acuity of the wrist, frequent confusions occur between closely located tactors, resulting in a low recognition accuracy. Furthermore, the accuracies reported in previous studies have mostly been measured for a specific posture and could further decrease with free arm postures in real life. Herein, we present Heterogeneous Stroke, a design concept for improving the recognition accuracy of STPs on a WTD. By assigning unique vibrotactile stimuli to each tactor, the confusion between tactors can be reduced. Through our implementation of Heterogeneous Stroke, the alphanumeric characters could be delivered with high accuracy (93.8% for 26 alphabets and 92.4% for 10 digits) across different arm postures.

accuracy, artificial intelligence, machine learning, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3411764.3445448

2511.16133

Country:

Europe (1.00)
North America > United States (0.49)
Asia > Japan > Honshū > Kantō (0.29)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.70)
Information Technology > Hardware (0.68)

Add feedback

Do Deep Neural Networks Suffer from Crowding?

Anna Volokitin, Gemma Roig, Tomaso A. Poggio

Neural Information Processing SystemsNov-21-2025, 12:51:24 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, flanker, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
Asia > Singapore (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

A Comparative Analysis of Recurrent and Attention Architectures for Isolated Sign Language Recognition

Alishzade, Nigar, Abdullayeva, Gulchin

arXiv.org Artificial IntelligenceNov-18-2025

This study presents a systematic comparative analysis of recurrent and attention-based neural architectures for isolated sign language recognition. We implement and evaluate two representative models-ConvLSTM and Vanilla Transformer-on the Azerbaijani Sign Language Dataset (AzSLD) and the Word-Level American Sign Language (WLASL) dataset. Our results demonstrate that the attention-based Vanilla Transformer consistently outperforms the recurrent ConvLSTM in both Top-1 and Top-5 accuracy across datasets, achieving up to 76.8% Top-1 accuracy on AzSLD and 88.3% on WLASL. The ConvLSTM, while more computationally efficient, lags in recognition accuracy, particularly on smaller datasets. These findings highlight the complementary strengths of each paradigm: the Transformer excels in overall accuracy and signer independence, whereas the ConvLSTM offers advantages in computational efficiency and temporal modeling. The study provides a nuanced analysis of these trade-offs, offering guidance for architecture selection in sign language recognition systems depending on application requirements and resource constraints.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/PCI66488.2025.11219827

2511.13126

Country: Asia > Azerbaijan (0.29)

Genre: Research Report > New Finding (0.87)

Industry: Education > Curriculum > Subject-Specific Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.95)

Add feedback

Context-Aware Dynamic Chunking for Streaming Tibetan Speech Recognition

Wang, Chao, Cai, Yuqing, Duojie, Renzeng, Zhang, Jin, Liu, Yutong, Tashi, Nyima

arXiv.org Artificial IntelligenceNov-13-2025

ABSTRACT In this work, we propose a streaming speech recognition framework for Amdo Tibetan, built upon a hybrid CTC/Atten-tion architecture with a context-aware dynamic chunking mechanism. The proposed strategy adaptively adjusts chunk widths based on encoding states, enabling flexible receptive fields, cross-chunk information exchange, and robust adaptation to varying speaking rates, thereby alleviating the context truncation problem of fixed-chunk methods. To further capture the linguistic characteristics of Tibetan, we construct a lexicon grounded in its orthographic principles, providing linguistically motivated modeling units. During decoding, an external language model is integrated to enhance semantic consistency and improve recognition of long sentences. Experimental results show that the proposed framework achieves a word error rate (WER) of 6.23% on the test set, yielding a 48.15% relative improvement over the fixed-chunk baseline, while significantly reducing recognition latency and maintaining performance close to global decoding.

machine learning, natural language, recognition, (16 more...)

arXiv.org Artificial Intelligence

2511.09085

Country: Asia > China > Tibet Autonomous Region (0.46)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.89)

Add feedback

Speech Emotion Recognition with Phonation Excitation Information and Articulatory Kinematics

Zhang, Ziqian, Huang, Min, Xiao, Zhongzhe

arXiv.org Artificial IntelligenceNov-12-2025

Speech emotion recognition (SER) has advanced significantly for the sake of deep-learning methods, while textual information further enhances its performance. However, few studies have focused on the physiological information during speech production, which also encompasses speaker traits, including emotional states. To bridge this gap, we conducted a series of experiments to investigate the potential of the phonation excitation information and articulatory kinematics for SER. Due to the scarcity of training data for this purpose, we introduce a portrayed emotional dataset, STEM-E2VA, which includes audio and physiological data such as electroglottography (EGG) and electromagnetic articulography (EMA). EGG and EMA provide information of phonation excitation and articulatory kinematics, respectively. Additionally, we performed emotion recognition using estimated physiological data derived through inversion methods from speech, instead of collected EGG and EMA, to explore the feasibility of applying such physiological information in real-world SER. Experimental results confirm the effectiveness of incorporating physiological information about speech production for SER and demonstrate its potential for practical use in real-world scenarios.

machine learning, natural language, recognition, (18 more...)

arXiv.org Artificial Intelligence

2511.07955

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Emotion (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Audio-Conditioned Diffusion LLMs for ASR and Deliberation Processing

Wang, Mengqi, Liu, Zhan, Jin, Zengrui, Sun, Guangzhi, Zhang, Chao, Woodland, Philip C.

arXiv.org Artificial IntelligenceOct-10-2025

Diffusion-based large language models (DLLMs) have recently attracted growing interest as an alternative to autoregressive decoders. In this work, we present an empirical study on using the diffusion-based large language model LLaDA for automatic speech recognition (ASR). We first investigate its use as an external deliberation-based processing module for Whisper-LLaMA transcripts. By leveraging the bidirectional attention and denoising capabilities of LLaDA, we explore random masking, low-confidence masking, and semi-autoregressive strategies, showing that Whisper-LLaDA substantially reduces WER compared with the baseline. On LibriSpeech, the best cascade system achieves 2.25%/4.94% WER on test-clean/test-other, representing a 12.3% relative improvement over the Whisper-LLaMA baseline on the test-other split. In contrast, a plain-text LLaDA without acoustic features fails to improve accuracy, highlighting the importance of audio-conditioned embeddings. We further evaluate Whisper-LLaDA as a standalone decoder for ASR with diffusion-based and semi-autoregressive decoding. Most experimental configurations achieve faster inference than the Whisper-LLaMA baseline, although recognition accuracy is slightly lower. These findings offer an empirical view of diffusion-based LLMs for ASR and point to promising directions for improvements.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2509.16622

Country:

Asia (0.29)
North America > United States (0.28)
Europe (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

LRFusionPR: A Polar BEV-Based LiDAR-Radar Fusion Network for Place Recognition

Qi, Zhangshuo, Cheng, Luqi, Zhou, Zijie, Xiong, Guangming

arXiv.org Artificial IntelligenceOct-3-2025

In autonomous driving, place recognition is critical for global localization in GPS-denied environments. LiDAR and radar-based place recognition methods have garnered increasing attention, as LiDAR provides precise ranging, whereas radar excels in adverse weather resilience. However, effectively leveraging LiDAR-radar fusion for place recognition remains challenging. The noisy and sparse nature of radar data limits its potential to further improve recognition accuracy. In addition, heterogeneous radar configurations complicate the development of unified cross-modality fusion frameworks. In this paper, we propose LRFusionPR, which improves recognition accuracy and robustness by fusing LiDAR with either single-chip or scanning radar. Technically, a dual-branch network is proposed to fuse different modalities within the unified polar coordinate bird's eye view (BEV) representation. In the fusion branch, cross-attention is utilized to perform cross-modality feature interactions. The knowledge from the fusion branch is simultaneously transferred to the distillation branch, which takes radar as its only input to further improve the robustness. Ultimately, the descriptors from both branches are concatenated, producing the multimodal global descriptor for place retrieval. Extensive evaluations on multiple datasets demonstrate that our LRFusionPR achieves accurate place recognition, while maintaining robustness under varying weather conditions. Our open-source code will be released at https://github.com/QiZS-BIT/LRFusionPR.

artificial intelligence, machine learning, recognition, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/LRA.2025.3614062

2504.19186

Country: Asia > China (0.14)

Genre: Research Report (0.82)

Industry: Information Technology (0.49)

Technology: